Topological complexity, contact order and protein folding rates 
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Monte Carlo simulations of protein folding show the emergence of a strong correlation between the 
relative contact order parameter, CO, and the folding time, t, of two-state folding proteins for longer 
chains with number of amino acids, N > 54, and higher contact order, CO > 0.17. The correlation 
is particularly strong for N — 80 corresponding to slow and more complex folding kinetics. These 
results are qualitatively compatible with experimental data where a general trend towards increasing 
t with CO is indeed observed in a set of proteins with chain length ranging from 41 to 154 amino 
acids. 
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I. INTRODUCTION 

The search for correlations between the protein folding 
kinetics and the native state equilibrium properties (i.e. 
chain length and stability) presents a major challenge 
for those working in the field of protein folding, both 
in theory and experiments. Progress has been signifi- 
cantly hindered by difficulty analyzing the folding of pro- 
tein molecules larger than about 100 amino acids, whose 
kinetics is widely believed to be based on some multiex- 
ponential mechanismtl. By contrast, for smaller proteins 
whose folding kinetics is close to single exponential, there 
seems to be some consensus as to the dependence of the 
folding time, t, on native state stabilityoQ. 

Experiment and theory appear to be at odds with each 
other over the dependence of the folding time on the num- 
ber N of amino acids inutile folding unit. Recent Monte 
Carlo (MC) simulationsB of a simple lattice model have 
proposed that, for two-state proteins, a scaling law of the 
type t w N x , X ~ 5, appropriately describes the depen- 
dence of the folding time on the chain length, N; a weaker 
dependence (A w 4) has been previously reported in Rcf.6 
for the same model Hamiltonian and distribution of con- 
tact energies, and in Ref.7 for a two-letter alphabet model 
that, apart from the commonly used isotropic contact 
interactions, also considers orientation-dependent inter- 
actions. However, available experimental data shows no 
correlation between t and MffitJ. 

We examine here the influence of the native state ge- 
ometric properties on the protein folding kinetics in the 
context of MC simulation. One simple parameter of the 
geometry which has already attracted attention is con- 
tact order, measuring the average length of the back- 
bone loops connecting contacting pairs of residues in the 
structural Formally, the relative contact order, CO, is 
defined as 



CO 



1 N 



(1) 



where N is the total number of amino acid residues in the 
protein, L is the total number of contacts, and A^j = 1 if 



residues i and j are in contact and is otherwise. is 
the backbone separation between residues i and j. High 
values of CO are associated with protein structures where 
amino acid residues interact on average with others that 
are far away in sequence (long-range interactions), while 
those displaying predominantly local interactions are of 
low contact order. 

A high correlation was found between the CO param- 
eter, and the folding rates for the protein set consid- 
ered in Ref.3: proteins with 'low' contact order tend to 
fold faster than proteins with 'high' contact order. This 
finding strongly supports the view that native geometry 
strongly influences the kinetics of the rate-limiting step 
in the two-state mechanism of small protein molecules 
(N < 100), determining their folding rates. 

The connection between CO and the dominant range 
of residue interactions brings back the controversial issue 
of the importance of local (and non-local) contacts in the 
dynamics of protein folding. An argument against local 
contacts is that they might increase the 'roughness' of 
the energy landscape, and therefore the stability of the 
unfolded stateEl. On the other hand, an argument sup- 
porting local interactions is based on the idea that they 
might provide the ideal substrate for the development of 
nucleation or initiation sites, small local sequence sub- 
structures forming in an early stage of folding and driv- 
ing the subsequent pathwayE3. Moreover, the formation 
of non-local contacts in early folding is entropically costly 
as it restricts tke number of conformations available to 
the folding unittil. 

The limited amount of experimental information 
availableouO suggests that investigating this problem 
within the scope of theoretical models could give more 
insight. 



II. MODELAND METHODS 

To achieve this goal we consider a simple three dimen- 
sional lattice model of a protein molecule whose Hamil- 
tonian is given by the contact approximation, 
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i>j 

where {<7i} stands for an amino acid sequence (oi be- 
ing the chemical identity of bead i) while {fl} is the set 
of bead coordinates that define a certain conformation. 
The contact function, A, equals 1 if beads i and j are 
in contact but not covalently linked, and is otherwise. 
We follow many previous studies in taking the interac- 
tion parameters e from the 20 x 20 Miyazawa-Jerningan 
matrix derived from the distribution of contacts in native 
proteinsE3. Our folding simulations follow the standard 
MC Metropolis algorithmic and the kink-jump MC move 
set (end-move, corner-flip, and crankshaft )E3. 



III. NUMERICAL RESULTS 
A. Contact order and homopolymer kinetics 

We have explored the distribution of the relative con- 
tact order parameter over a population of 2000 maxi- 
mally compact target geometries found by homopolymer 
relaxationE£l. This distribution is shown in Figures 1(a)- 
(e) for each of the studied chain lengths N — 36,48,54,64 
and 80, which are all commensurate with folding to fill 
a simple cuboid. There is only a slight shift in the 
modal contact order with chain length, from 0.19 for 
TV = 36,48 down to 0.17 for higher N. However, the 
target fraction in the histogram tail (CO > 0.22) is sig- 
nificantly smaller for N > 54 than for shorter chains. 
Interestingly, the values of CO found for these lattice pro- 
teins span approximately the same range as those found 
in real kinetically characterized single domain proteins 
(0.0745 <CO< 0.2120)0. 

We have also checked the intrinsic kinetic accessibility 
of the compact configurations obtained, by measuring the 
time t co i for these configurations to be reached under 
homopolymer relaxation. Figure 1(f) shows there is no 
evident correlation between t co i and CO . 

B. Finding the optimal folding temperature 

To investigate the relationship between protein fold- 
ing and contact order, (at least) 20 target conformations 
for each chain length, were selected so as to sample uni- 
formly across the range of contact order. For each tar- 
get an ensemble of 100 designed sequences was prepared 
by using the design method developed by Shakhnovich 
and GutinEZI based on random heteropolymer theory, and 
simulated annealing techniques. The average trained se- 
quence energy, < E > , is shown in Table I along with the 
standard deviation of the energy distribution, a. Except 
for TV = 54, the chemical composition of the designed 



sequences was the same as the one used in Ref.5. In that 
study it was shown that the optimal folding temperature, 
Tf id{N), defined as the temperature that minimizes the 
folding time, is close to a self-averaging parameter. 

Since the Shakhnovich and Gutin design scheme pre- 
serves the overall sequence chemical composition we can 
safely use for 36 and 48 bead long sequences studied here 
the T /o;d (36) and T /oW (48) found in Ref.5. 

For the longer TV, and most particularly N — 80, foldic- 
ity, defined as the fraction of successful folding runs over 
the total number of attempted runs, was for the vast 
majority of the targets less than unity. This forced us to 
define the optimal folding temperature, Tf id(N), is such 
cases as the temperature which optimized foldicity rather 
than the temperature which minimized the folding time. 
The case N — 48 sits at the margin and provides confir- 
mation that the two approaches to Tt i^[N) are not in 
conflict, as shown in Figure. 2. 

C. Contact order and folding kinetics 

After determining Tf id(N) we ran a MC folding simu- 
lation for every designed sequence. The simulations pro- 
ceeded until T max (N) MC steps or until folding was ob- 
served. The value of T max (N) was chosen such that it was 
much longer than the typical folding time of the studied 
sequences. 

Fig. 3 shows the dependence of the folding time, t, on 
the contact order parameter for chain lengths TV = 36 and 
TV = 48. The folding time was computed as the mean first 
passage time averaged over 100 simulation runs. In either 
case the points are close to be uniformly distributed sug- 
gesting no correlation between the CO, and the folding 
time for these chain lengths. Here and elsewhere error 
bars indicate ± one standard error in the mean. 

Figures 4(a)- (c) show the dependence of foldicity on 
CO for N = 54, 64 and N = 80 respectively. The results 
presented in Figures. 4(d)-(f) show, for the same chain 
lengths, the dependence of the estimated folding time 
on the relative contact order paramenter. Two distinct 
scenarios emerge from the analysis of the graphs: 

1. For CO < 0.17 there is no correlation between 
foldicity (or folding time) and the relative contact 
order parameter; 

2. For CO > 0.17, a general trend towards decreas- 
ing foldicity with increasing relative contact order 
can be observed. In this regime, a considerably 
strong positive correlation of r — 0.70, 0.70 and 
0.79, between t and CO, shows up for chain lengths 
N = 54, 64 and 80 respectively. 

IV. DISCUSSION AND CONCLUSIONS 

The 'turning point' value of CO — 0.17 is actually 
the peak of the homopolymer relaxation histogram dis- 
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N < E > a 

"36 -15.8722 0.0194 

48 -23.1407 0.0331 

54 -28.8028 0.0158 

64 -35.0811 0.0477 

80 -46.2858 0.0375 



TABLE I. Average sequence energy under the training scheme of Shakhnovich and Gutin. For each chain length, N, the 
average is computed over the total number of sequences (w 2000) designed for each set of targets, a is the standard deviation 
of the mean. 



tribution as previously discussed. This means that CO 
and folding time are positively correlated only for pro- 
teins with predominantly non-local contacts. We inter- 
pret this result as a consequence of the properties of the 
move set used to explore the conformational space to- 
gether with the ruggedness of the energy landscape. As 
seen in section [II A, kink-jump dynamics does not favour 



the formation of high CO structures in homopolymers. In 
proteins, when the native structure is of high CO, it will 
be difficult to escape from kinetic traps associated with 
local energy minima and structures of lower CO. This 
confirms and explains our previous findingsO according to 
which the folding performance achievable is strongly sen- 
sitive to target conformation for chain lengths N > 80. 

The comparison of the simulation's results with the ex- 
perimental data set of 24 two-state proteins, with chain 
length ranging from 41 to 154 amino acids, reported in 
Ref.3 is hindered by the fact that the proteins consid- 
ered in Ref.3 fail to exhibit the scaling of folding time 
with chain length which is typical of lattice model simu- 
lations. However, a strong correlation (r = 0.80) is also 
found between CO and the folding times. Moreover, this 
correlation is considerably improved (r = 0.97) if only 
long protein chains (N > 80) are considered. 

As a general conclusion, we might say that results on 
lattice models encourage the idea that the contact order 
of the native structure plays a significant role in deter- 
mining the folding rate. The match with the correla- 
tion between the CO and the folding time found from 
the analysis of experimental data suggests that lattice 
polymer dynamics with local moves does capture the key 
dynamical features of real protein folding. 

It would be interesting to know if similar results can 
be found in the scope of 'off-lattice' models, where one 
would expect proteins with high helical content to be 
better folders. 
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FIG. 2. Dependence of the folding probability, Pf„id, on 
log(t) at three different temperatures. Pfoid was computed 
as the number of folding simulations which ended up to time 
t normalised to the total number of attempted runs. For 
each curve ^2000 simulations were used distributed acroos 
the available 48 bead long targets and sequences. 
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FIG. 3. Dependence of the folding time, t, on the relative 
contact order, CO, for 36 and 48 bead long targets. 
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FIG. 1. (a)-(e) Histograms of the distribution of the relative contact order parameter; for each studied chain length, N, a 
sample of 400 targets was considered, (f) Shows the dependence of the collapsing time, t co i, on CO. t co i was computed as the 
mean first passage time (FPT) averaged over 400 simulation runs. 
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FIG. 4. (a)-(c)Dependence of foldicity on the relative contact order, CO, for 54, 64 and 80 bead long targets, (d)-(f) show 
the dependence of the (estimated) folding time, t, on CO. 
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